Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make deepHashCode unique depending on order #101

Closed
wants to merge 1 commit into from
Closed

Make deepHashCode unique depending on order #101

wants to merge 1 commit into from

Conversation

MrNavaStar
Copy link

Currently DeepEquals.deepHashCode() does not care about order. This means that these two lists will be equal:

image
image

The proposed changes fix this behaviour, making only truly identically objects match. This is of course a breaking api change.

Thanks for making this awesome library, Cheers!

@jdereg
Copy link
Owner

jdereg commented Feb 16, 2024

I want to use an extreme example to make a point, and then hopefully you will see that the hashCode() ordering you've brought up, may not really be a concern. One can have a hashCode() method return 1, for example. Yes, that would perform awful, but hashCode() is not .equals(). At the other extreme, one can have a robust finely grained hashCodes() that are usually unique() - makes code run faster.

Given the above, I am trying to see why you need this behavior. Hash code is not equals(). Also, the change you've made modifies the hashCode() function that someone else intentionally wrote - your proposed code calls their custom method (like the code was doing before) but then multiplies it and adds to it (tweaks the value). Given that hashCodes() alone do not determine equality - hashcodes can rule out that objects are NOT equal, but they cannot rule that objects are equal - I am trying to understand why this proposed code change is needed?

@MrNavaStar
Copy link
Author

MrNavaStar commented Feb 16, 2024

Thanks for the quick response! I am working on a networking library that relies on packets being registered in the same order on the client and server. Having the hash code be unique based on order is quite useful in this case as it allows be to determine if the client and server have different versions.

I'm a little confused on your argument here. You said that hash codes can only rule that objects are not equal. In my example, the two lists are not equal, and yet the hash codes are the same. Perhaps I am misunderstanding something.

Either way I only see benefits to the change. I don't see how bringing order into the hash would cause any issues (as it's how most programs calculate hashes, ArrayLists implementation for example) but if I'm mistaken please let me know.

If this is not something your interested in that is perfectly acceptable, and I wish you the best 😊

@jdereg
Copy link
Owner

jdereg commented Feb 16, 2024

I want to make sure we both are in agreement on how hashCode() works. The hashCoce() function can be written to generate whatever value it wants for an instance, but it has to generate the same value always for the same instance. Therefore, returning 1 always, no matter what, is a valid hashCode() function. It would seriously slow things down in a HashMap lookup, but it would work. This extreme example makes it understandable that two or more items may generate the same hashCode() - it can happen often and is perfectly fine. HashCode() is never guaranteed to be unique. The closest you will get to something like that, would be to use a cryptographic function like SHA-256/SHA-512 to generate a hash, however, when you go to fit it back into an integer, there will be hash collisions. (SHA-256 has no known duplicates (rainbow pairs) for different inputs discoverted yet, but eventually they could be found - by an advanced quantum computer).

Now given that, would one really want the items in a list to generate a different hash because the items were ordered differently? Shouldn't the hash of the list be the sum of the hash of the items in the list? For a Set, for example, the sums would absolutely have to be the hash - as two sets ordered differently, are still consider equivalent [.equals()] per the definition of the .equals() contract. However, two Lists that are ordered differently are not considered equivalent. If one is going to change the hashCode() to include order, that can only be done on any container that considers the order of the items to be part of the equivalence. So all Set derivatives would have to use an order indepedent hash() and a List derivatives may or may not (doesn't matter) choose to include order within the hashCode() function.

It would be an error to include order in a hashCode() function that, for example, was used by a Set. Because the hashCode() and .equals() functions would be in disagreement. Two sets are .equals() if the elements in one set are in the other set (and vice-versa). However, if order were part of the hashCode() for the Set, then two equivalent Sets with different ordering of the same items would have a different hashCode(), yet their equals() function would say they are equivalent. That breaks the contract that two items that are .equals() must both have the same hashCode() value.

@MrNavaStar
Copy link
Author

MrNavaStar commented Feb 17, 2024

Okay I guess that's a fair argument, but are you saying that the way the standard library implements hash code for lists is wrong? Because they do care about order. My code changes are just copied from the way it's done for arraylists.

Edit:
Never mind, I guess lists can implement that behaviour because they only apply to themselves. Something else (like a hashset) would not want that behaviour applied when using it.

I guess this can be closed. Sorry for the confusion.

@jdereg
Copy link
Owner

jdereg commented Feb 17, 2024

I appreciate your ideas and efforts here. As you pointed out, in this particular piece of code, it doesn't know if it is working with a List or not. However, we could perform and instanceof check here, and then conditionally perform the hashCode() function as you proposed, otherwise not. We need to truly think about which JDK Collection classes care about order versus not. List is a good starting point. An OrderedSet, for example, I'd have to look at implementations, but I don't believe they could change their hashCode() based on order, because fundamentally, a Set is equivalent regardless of order. The ordered nature of OrderSet affects how elements are addressed when walking through them, but I do not believe it overridees the contract of equals() and hashCode(). I'm thinking this can be implemented, we just need to ensure we have cordoned off the appropriate types here, and it may only be a List check. I'll do something checking today.

@MrNavaStar
Copy link
Author

MrNavaStar commented Feb 17, 2024

I think it's probably only lists that need fixing. Any jdk classes should implement hash code already one way or another. I'll think about this some more as well and see if I can come up with anything else.

I Appreciate your feedback!

@jdereg
Copy link
Owner

jdereg commented Mar 11, 2024

I updated deepHashCode() to consider List order and Array order in its hashCode() computation. While at it, I updated it make floats and doubles influence hash as long as they are relatively near each other. DeepEquals.hashCode() honors the hashCode() and equals() contract (it did before, but it provides better hashes now).

Before two lists or two arrays in different order with the same values returned the same hash. This is legal, but not as fast as allowing the hashes to differ in this case. Please understand that hashes values are NEVER gauranteed to be unique. However, the more they differ, it allows for quick test for not equals: If two items have different hashCodes(), they are certainly not equal. If two items have the same hashCode(), they may or may not be equals.

Anyway, the changes are in code in origin/head. I will be releasing a 2.5.0 version to Maven Central pretty soon that will contain this update.

Thanks for finding this opportunity to optimize the code.

@jdereg jdereg closed this Mar 11, 2024
@MrNavaStar
Copy link
Author

Awesome, and thanks!

@MrNavaStar MrNavaStar deleted the patch-1 branch March 13, 2024 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants